A comparison of clustering quality indices using outliers and noise

نویسندگان

  • Luis Guerra
  • Víctor Robles
  • Concha Bielza
  • Pedro Larrañaga
چکیده

Quality indices in clustering are used not only to assess the quality of the partitions but also to determine the number of clusters in the final result. When these indices are evaluated in a case study, real data conditions or different clustering algorithms are seldom taken into account. Here, some of the standard indices used in the literature are compared using more realistic databases that include outliers or noisy dimensions, which is more like a real problem-solving approach. Besides, three different clustering methods are used in an attempt to identify different behaviours. Also, the performance of the quality index-clustering algorithm tandem is compared to random grouping, with the aim of running an additional check. The indices are ranked, and index-based conclusions are drawn for all the scenarios.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

Bilateral Weighted Fuzzy C-Means Clustering

Nowadays, the Fuzzy C-Means method has become one of the most popular clustering methods based on minimization of a criterion function. However, the performance of this clustering algorithm may be significantly degraded in the presence of noise. This paper presents a robust clustering algorithm called Bilateral Weighted Fuzzy CMeans (BWFCM). We used a new objective function that uses some k...

متن کامل

A robust wavelet based profile monitoring and change point detection using S-estimator and clustering

Some quality characteristics are well defined when treated as response variables and are related to some independent variables. This relationship is called a profile. Parametric models, such as linear models, may be used to model profiles. However, in practical applications due to the complexity of many processes it is not usually possible to model a process using parametric models.In these cas...

متن کامل

Investigation of outliers of evaluation scores among school of health instructors using outlier - determination indices

Introduction: Teacher evaluation, as an important strategyfor improving the quality of education, has been considered byuniversities and leads to a better understanding of the strengthsand weaknesses of education. Analysis of instructors’ scoresis one of the main fields of educational research. Since outliersaffect analysis and interpretation of information processes bothstructurally and concep...

متن کامل

A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm

Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Intell. Data Anal.

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2012